Search results for "log analysis"
showing 6 items of 6 documents
Improving clustering of Web bot and human sessions by applying Principal Component Analysis
2019
View references (18) The paper addresses the problem of modeling Web sessions of bots and legitimate users (humans) as feature vectors for their use at the input of classification models. So far many different features to discriminate bots’ and humans’ navigational patterns have been considered in session models but very few studies were devoted to feature selection and dimensionality reduction in the context of bot detection. We propose applying Principal Component Analysis (PCA) to develop improved session models based on predictor variables being efficient discriminants of Web bots. The proposed models are used in session clustering, whose performance is evaluated in terms of the purity …
Modeling a non-stationary bots’ arrival process at an e-commerce Web site
2017
Abstract The paper concerns the issue of modeling and generating a representative Web workload for Web server performance evaluation through simulation experiments. Web traffic analysis has been done from two decades, usually based on Web server log data. However, while the character of the overall Web traffic has been extensively studied and modeled, relatively few studies have been devoted to the analysis of Web traffic generated by Internet robots (Web bots). Moreover, the overwhelming majority of studies concern the traffic on non e-commerce websites. In this paper we address the problem of modeling a realistic arrival process of bots’ requests on an e-commerce Web server. Based on real…
Verification of Web traffic burstiness and self-similarity for multiple online stores
2017
Developing realistic Web traffic models is essential for a reliable Web server performance evaluation. Very significant Web traffic properties that have been identified so far include burstiness and self-similarity. Very few relevant studies have been devoted to e-commerce traffic, however. In this paper, we investigate burstiness and self-similarity factors for seven different online stores using their access log data. Our findings show that both features are present in all the analyzed e-commerce datasets. Furthermore, a strong correlation of the Hurst parameter with the average request arrival rate was discovered (0.94). Estimates of the Hurst parameter for the Web traffic in the online …
Practical Aspects of Log File Analysis for E-Commerce
2013
The paper concerns Web server log file analysis to discover knowledge useful for online retailers. Data for one month of the online bookstore operation was analyzed with respect to the probability of making a purchase by e-customers. Key states and characteristics of user sessions were distinguished and their relations to the session state connected with purchase confirmation were analyzed. Results allow identification of factors increasing the probability of making a purchase in a given Web store and thus, determination of user sessions which are more valuable in terms of e-business profitability. Such results may be then applied in practice, e.g. in a method for personalized or prioritize…
Syslog-protokollan viestien analysointi järjestelmän vianetsinnän apuna
2017
Tietojärjestelmät keräävät toiminnastaan jatkuvasti lokitietoja. Vikatilanteissa lokitietoja voidaan hyödyntää virheen paikantamisen apuna. Lokinkirjoittamisen teollisuusstandardiksi on noussut Syslog-protokolla. Tarve protokollaa varten kehitetyille lokianalyysitekniikoille ja -työkaluille on näin ollen noussut. Tässä kandidaatintutkielmassa pyritään esittelemään näitä analyysitekniikoita ja -työkaluja, ja pohtimaan näiden hyödyllisyyttä järjestelmän vianetsintää suorittavan järjestelmänvalvojan näkökulmasta. Information systems constantly gather operational log data. In case of system failure, this log data can be used as a troubleshooting tool in locating the problem source. The Syslog p…
Adaptive framework for network traffic classification using dimensionality reduction and clustering
2012
Information security has become a very important topic especially during the last years. Web services are becoming more complex and dynamic. This offers new possibilities for attackers to exploit vulnerabilities by inputting malicious queries or code. However, these attack attempts are often recorded in server logs. Analyzing these logs could be a way to detect intrusions either periodically or in real time. We propose a framework that preprocesses and analyzes these log files. HTTP queries are transformed to numerical matrices using n-gram analysis. The dimensionality of these matrices is reduced using principal component analysis and diffusion map methodology. Abnormal log lines can then …